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Abstract. We consider a temporal logic EF + F^^ for unranked, unordered finite trees. 
The logic has two operators: EF(/?, which says "in some proper descendant ip holds", and 
F~^(p, which says "in some proper ancestor (p holds". We present an algorithm for deciding 
if a regular language of unranked finite trees can be expressed in EF + F^^. The algorithm 
uses a characterization expressed in terms of forest algebras. 



1. Introduction 

We say a logic has a decidable characterization if the following decision problem is 
decidable: "given as input a finite automaton, decide if the recognized language can be 
defined using a formula of the logic" . Representing the input language by a finite automaton 
is a reasonable choice, since many known logics (over words or trees) are captured by finite 
automata. 

This type of problem has been successfully studied for word languages. Arguably best 
known is the result of McNaughton, Papert and Schiitzenberger |1H [8], which says that 
the following three conditions on a regular word language L are equivalent: a) L can be 
defined in first-order logic; b) L can be defined using a star-free expression; and c) the 
syntactic semigroup of L does not contain a non-trivial group. Since condition c) can 
be effectively tested, the above theorem gives a decidable characterization of first-order 
logic. This result demonstrates two important features of work in this field: a decidable 
characterization not only gives a better understanding of the logic in question, but it often 
reveals unexpected connections with algebraic concepts. During several decades of research, 
decidable characterizations have been found for fragments of first-order logic with restricted 
quantification and a large group of temporal logics, see [9] and [15] for references. 

For trees, however, much less is known. No decidable characterization has been found 
for what is possibly the most important subclass of regular tree languages, first-order logic 
with the descendant relation, despite several attempts [101 17| |2]. Similarly open are chain 
logic [H] and the temporal logics CTL, CTL* and PDL. However, there has been some recent 
progress. In [5], decidable characterizations were presented for the temporal logics EF and 
EX+EF; while Benedikt and Segoufin pj characterized tree languages definable in first-order 
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logic with the successor relation (but without the descendant relation). Two new results 
give effective characterizations for some fragments of first-order logic with limited quantifier 
alternation. The expressive power of alternation-free formulas (i.e. boolean combinations 
of formulas with quantifier prefix 3*) is characterized in [1]. Properties that can be defined 
both with quantifier prefix 3*V* and also with quantifier prefix V*3* are characterized in [3]. 
We will come back to the latter class later on in this introduction. 

In this paper, we continue the line of research started in [5], by focusing on a temporal 
logic for trees. We consider a logic called EF -|- F~^. This logic has two operators: EF99, 
which says "in some proper descendant ip holds", and F~^99, which says "in some proper 
ancestor ip holds". Thanks to the backward modality, EF -|- F^^ is more expressive than EF 
alone. For instance, the formula 

EF(a A-F-i-6) 

defines the class of trees where some node has label a, but all of its ancestors have label b. 
This is a property reminiscent of CTL, and cannot be expressed by only using EF, since it 
fails the identities that must be satisfied by EF-definable languages [6j. 

The main result in this paper is Theorem 16. 2^ which gives a decidable characterization 
of languages definable in EF -|- F~^. Before we present this result, in Section [2] we try to 
justify the choice of the logic EF -|- F~^. In Section [3] we present the algebraic formalism 
that will be used in the proofs. The rest of the paper is devoted to proving the main result. 

I would like to thank Luc Segoufin. We spent a lot of time together trying to understand 
the expressive power of EF-|-F~^. Without his input this paper would not have been possible. 
I would also like to thank the anonymous referees for their helpful comments. 

2. Why two-way unary temporal logic 

There are two reasons to consider EF -|- F~^. The first reason is that, over words, 
this logic corresponds to an important and well-studied class of regular languages. The 
second reason is that, over trees, the logic is related to XML. We go over these reasons in 
Sections 12.11 and 12.21 respectively. 

2.1. The w^ord analogy. There is a very robust class of regular word languages that has 
several equivalent descriptions (a survey of this class can be found in [12j): 

(1) Word languages that can be defined in the temporal logic F -|- F~^. Here F(^ means "in 
some future position (/?" and F~^(/? means "in some past position (/?". 

(2) Word languages that can be defined by a first-order formula with two variables and the 
left-to-right ordering of positions (but without the successor relation) . 

(3) Word languages that can be defined by a first-order formula (with many variables, the 
left-to-right ordering, but without the successor relation) with a V*3* quantifier prefix, 
and also by one with an 3*V* quantifier prefix. 

(4) Word languages whose syntactic semigroup belongs to the semigroup variety DA. One 
way of defining this variety is in terms of an identity: DA is the class of semigroups 
that satisfy the identity {sty = {stYs{stY. 

(5) Word languages described by finite disjoint unions of unambiguous products (a form of 
regular expression). 

(6) Word languages that can be recognized by "turtle automata" , a type of deterministic 
two-way word automaton. 
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(7) Word languages that can be recognized by two-way deterministic automata where the 
states in a run are non-decreasing with respect to a given order. 

An important corollary of property H] is that membership of a regular language in the above 
class is decidable: it suffices to check if the syntactic semigroup of the language satisfies the 
DA identity. 

Some of the above classes generalize easily to trees, some don't. 

We will not talk about classes El [6] and [7l It is not clear what unambiguous expressions 
are for trees, likewise for the automata. 

We will come back to the algebraic description in item [4] later on in the paper. 

The three logically defined classes [H [2] and [3] can be easily extended to trees. A natural 
counterpart of class[T]is the logic EF-|-F~^ considered in this paper. The classes [2] and [3] can 
define tree languages if the order is interpreted as the ancestor / descendant ordering of tree 
nodes. (One could also consider variants where two partial orders of nodes are available 
instead of one: the ancestor /descendant order and also the left-to-right ordering of siblings. 
We keep to the simpler case, where siblings are unordered.) The logically defined classes 
diverge for trees: 

• Two-variable logic is strictly more expressive than the temporal logic. The translation 
from temporal to two-variable logic is fairly obvious. For the converse, the problem is 
that X ^ y /\y ^ X cannot be expressed in the temporal logic. For instance, the language: 
"there are two a's" can be defined by a two- variable formula, but cannot be defined in the 
temporal logic. This is because the temporal logic is bisimulation invariant, and cannot 
see the difference between one child with a and two children with a. (Note however, that 
the languages "two a's below some 6", or "three a's" cannot be defined in two- variable 
logic.) 

• As we will show at the end of this paper, the intersection of V*3* and 3*V* is incomparable 
with both the two-variable and the temporal logic. 

The second fragment has been considered in [3], the investigation therein shows that 
it is a well-behaved class of tree languages. We are left with the temporal logic and two- 
variable logic. Why do we choose temporal logic and not two-variable logic? The reason 
is that two-variable logic seems to be less robust for trees: why can "two a's" be defined, 
but not "three a's"? Of course it is nonetheless important to understand two- variable logic, 
and we leave this task as future work. 

2.2. XPath. XPath is a formalism used to describe paths and nodes in unranked trees. 
There is a strong connection between XPath and two-variable logics 

A set of paths is seen as a binary relation P{x,y), which says when a source x can be 
connected with a target y. The basic idea in XPath is that one starts with atomic paths, 
called axes, such descendant of y" , or "x is a child of ?/" , and then constructs 

longer paths using mechanisms such as concatenation. Marx and de Rijke [?] show that a 
fragment of XPath called Core XPath has exactly the same expressive power as two- variable 
first-order logic. (The equivalence in expressive power is for Boolean queries in XPath and 
sentences of two-variable logic. The equivalence also holds for unary queries in XPath and 
formulas of two- variable logic with one free variable; but it fails for binary queries.) Note 
however, that the axes considered by Marx include child and next-child, which go beyond 
the fragments considered in this paper. When the only axes allowed are "descendant" and 
"ancestor". Core XPath has exactly the same power as "our" logic EF -|- F~^. A decidable 
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characterization for fragments of XPath with the other axes, including the one considered 
by Marx, is left as future work. 

3. Basic definitions 

3.1. Trees and forests. We work with unranked finite labeled trees. We assume that an 
alphabet {A, B) contains two types of labels: one set of labels A that can be used in the 
leaves, and another set of labels B that can be used in inner nodes (i.e. not leaves). This 
division is convenient for the algebraic framework we use in general, and for the induction 
proof in this paper in particular. Trees are defined as follows: every leaf label a G ^ is a 
tree; if ti, . . . , are trees and 6 € i? is an inner node label then h{ti + • • • + in) is a tree. A 
forest is a sequence of trees. As above, we concatenate forests using +. In particular every 
forest is of the form t = ti + • • • + t„, for some trees ti, . . . ,tn- We do not allow empty 
forests, so n > 1. We denote both trees and forests using letters s,t. When 6 is a label and 
t is a forest, we write bt for the tree that has label b in the root, and where the children 
form the forest t. In other words, we omit the parentheses and write bt instead of b{t). 

A context is a forest where exactly one leaf is labeled by a special label □; this leaf is 
interpreted as a hole. We denote contexts by p, q. The main path in a context consists of 
the ancestors of the hole. A forest t can be substituted in place of the hole of a context p, 
the resulting forest is denoted p{t), or sometimes pt. 




There is a natural composition operation on contexts: the context pq is the unique 
context such that {pq)t = p{qt) holds for all forests t. We allow the empty context, denoted 
by □; this is the context where the only node in the context is the hole □. The empty 
context satisfies Ot = t. Nodes of trees, forests and contexts are defined the usual way. We 
write X, y, z for nodes, and x < y when x is an ancestor of y. 

The reader will notice that the trees and forests we defined are sibling-ordered (i.e. s + t 
is not the same as t + s). However, properties definable in our logic EF + are going to 
be invariant under this order. 

3.2. The logic. The logic EF + F~^ is defined as follows: 

• Every label - both inner node label and leaf label ~ is a formula; this formula holds in 
nodes with that label. 

• Formulas are closed under boolean combinations, including negation. 

• If is a formula, then EFip is also a formula; it is true in a node x if there is some proper 
descendant y > x where if is true. Likewise for f~^f, but this time y must be a proper 
ancestor y < x. 
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A formula 99 of EF + is most naturally interpreted as a unary query, i.e. in a given 
tree it selects a set of nodes. For instance, the formula EFtrue selects all inner nodes. In this 
paper, we are interested in tree languages, i.e. boolean queries, where a formula is either 
true or false in a given tree. To get a boolean query, we say a formula of EF + F~^ is true 
in a tree if it is true in its root. 

The main contribution of this paper is a characterization of the regular tree languages 
that can be defined by a boolean query of EF + F~^. It is, however, natural to also ask for a 
characterization of unary queries. For instance, the first unary query below can be defined 
in EF + F~^, but the second one cannot: 

• Some ancestor of the selected node has label a, i.e. F~^a. 

• Some child of the selected node has label a. 

In general, a regular unary query can be given e.g. as a formula of monadic-second order 
logic with one free variable. Note that although the second unary query cannot be defined, 
the tree language "some child of the root has label a" can be defined, by the formula 

EF(a A F~^irue A ^F^^F^^iree) . 

This suggests that characterizing unary queries is a nonobvious problem, which we leave as 
future work. 

3.3. Antichain composition principle. A problem with EF-|- F^^ is that it is not closed 
under "composition". We illustrate this problem, together with a workaround, for words; 
then we show the result for trees. 

Consider the word languages aa and (a + b)*. Both are definable in F + F^^, and 
even only using F, but the language (a + b)*aa{a + b)* is not. We claim however, that the 
concatenation of two definable languages is also definable if the place in the word where 
they meet can be uniquely determined in F + F~^: 

Lemma 3.1 (Composition for words). Let L, K be two word languages definable inf + F~^ 
and let if be a F + F~^ formula with the semantic property that in every word, if holds in 
at most one word position. The following word language is also definable in F + F^-*^.- 

{ai . . . a„ : ai • • • aj E L, Cj+i ■ ■ ■ a^i & K, and ip holds in ai - ■ ■ an at position i + 1} 

Proof. We use relativization. We define ipi by taking the formula defining L, and replacing 
each subformula by ^ A Fip. Likewise, we define V2 by taking the formula defining K, 
and replacing each subformula tp hy tp A {(p y F~^ip). The formula for the language in the 
lemma is then ipi A F{ip A ip2)- D 

For trees, the situation is more complicated. First of all, there are two notions of 
composition: concatenation s + t for forests and composition pq for contexts. We are 
interested in generalizing Lemma 13.11 to composition of contexts. In our generalization 
though, we may need to substitute many trees simultaneously. This leads to a slightly less 
appealing definition, which follows. 

A formula is called antichain if in every tree, the set of nodes where it holds forms an 
antichain, i.e. a set (not necessarily maximal) of nodes pairwise incomparable with respect 
to the descendant relation. This is a semantic property, and may not be apparent just 
by looking at the syntax of the formula. For instance, the first two formulas below are 
antichain, while the third is not: 

• The node is a leaf: ^EFtrue. 
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• The node is a minimal occurrence of 6: 5 A -iF b. 

• The node has label b. 

Using antichain formulas, we define our notion of concatenation. The ingredients are: 

• An antichain formula if. 

• Disjoint tree languages Li, . . . , L„. 

• Leaf labels oi , . . . , • 

Let i be a tree. We define the tree 

t[{Li,<f) ai,..., (L„, If) On] 

as follows. For each node x of t where the antichain formula (p holds, we determine the 
unique i such the tree language Lj contains the subtree oi x. If such an i exists, we remove 
the subtree of x (including x), and replace x by a leaf labeled with Oj. Since ip is antichain, 
this can be done simultaneously for all x. Note that the formula (p may depend also on 
ancestors of x, while the languages Li only talk about the subtree of x. 

Lemma 3.2 (Antichain composition principle). Let ip, Li,...,L„ and ai,...,a„ be as 
above. If Li, . . . , Ln are tree- definable, and K is a tree-definable language, then so is 

{t : t[{Li,p) ^ ai, . . . , {Ln, p) an] G K} . 

Proof. This is proved by a relativization entirely analogous to the one used in Lemma [3. II □ 

The point of this lemma is that the languages Li are taken out of their context inside 
the tree t. For instance Li can say something like: "the root has label a and a child with 
label 6", 

Li = EF(6 A f-^a A ^f-^f-^true) , 
while in general the property "a node in the tree that has label b and a child with label 6" 
cannot be expressed in EF + F~^. 

4. Forest algebra 

To represent languages of trees, we will be using forest algebra. We feel that using 
forest algebra instead of automata simplifies the combinatorics used in our characterization. 
Furthermore, when using forest algebra, the key properties from Theorem 16.21 can be stated 
in terms of identities. 

Here we only sketch out the definitions and basic properties; the reader is referred to [6] 
for more details. The algebras described in [6] differ slightly from those used here — mainly 
in that we do not allow empty forests here — but the results carry over into this setting. 

A forest algebra is to a regular language of unranked trees as a semigroup is to a regular 
language of words. Formally, a forest algebra is an algebra with two sorts {H, V), along with 
some operations that satisfy a number axioms. While defining the operations and axioms, 
we will illustrate them on an important example, called the free forest algebra, where H is 
the set of all nonempty forests, and V is the set of all, possibly empty, contexts. 

The operations and axioms of forest algebra are presented below. Elements of H will 
be denoted by h, g, f and elements of V will be denoted by v, w, u. 

• A composition operation + on H. This operation is required to be associative, i.e. h + 
{9 + f) = + 9) + f holds for all f,g,h G H. This makes H a semigroup, called the 
horizontal semigroup, and justifies the notation h + g + f . In the free forest algebra, + is 
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forest concatenation. We do not require H to contain a neutral element, e.g. there is no 
empty forest in the free forest algebra. 

• A composition operation • on V. Again, this is required to be associative. We omit the 
• symbol, writing vw instead of v ■ w, for v,w € V. Furthermore, we require there to 
be a neutral element O £ V, i.e. an element satisfying v ■ O = O ■ v = v for all v €z V. 
In particular, F is a monoid, called the vertical monoid. In the free forest algebra, ■ is 
context composition, while □ is the empty context. 

• An insertion operation V ^ H ^ H. The result of this insertion is denoted by vh € H. 
The empty context acts as the identity of this operation, i.e. D/i = h. The insertion 
operation must be a left action, i.e. it must satisfy {yw)h = v{wh) for v,w £ V and 
h £ H, which justifies the notation vwh. In the free forest algebra, the left action 
is substituting a forest into a context. There is an faithfulness requirement: distinct 
contexts v,w £V must induce different functions. 

• An operation left : H xV ^ V. This operation must satisfy left(h, v)g = h + vg for v £V 
and g,h £ H. Thanks to this axiom, we can without ambiguity write h + v to denote 
the element left{h,v). In the free forest algebra, h + v is the context obtained from v by 
prepending the forest h (next to the root, not the hole). In a similar way we define v + h, 
in terms of an operation right. 

As demonstrated above, the free forest algebra is a forest algebra. Clearly the free 
algebra depends on the leaf labels A and inner node labels B (and only on these); once 
these are given, the free algebra is denoted by {A,B)^. When describing a forest algebra, 
we usually only give names to the carrier sets H and V, leaving the operations implicit. 

Let {H, V) and {G, W) be two forest algebras. A forest algebra morphism 

a:{H,V)^{G,W) 

is a pair of functions 

a = {an, ay) oih : H G ay : V —>■ W 

that preserve all operations in the signature, namely, composition + in H, composition • in 
V, insertion, and the left, right operations. For instance, preserving insertion is: 

anivh) = av{v){aH{h)) . 

To avoid clutter, we omit the subscripts, and write a{h) instead of anih), likewise for v. 

If a is a morphism, then the type under a of a forest t is simply the value a{t). Whenever 
the morphism a is clear from the context, we omit the qualifier "under a". 

In this paper, a forest algebra will either be a free forest algebra, or a finite forest 
algebra. In the first case, elements of the first sort will be called forests and denoted by 
s, t, while elements of the second sort will be called contexts, and denoted by p, q. In the 
second case, of a finite forest algebra, elements of the first sort will be called forest types 
and denoted by /, g, h, while elements of the second sort will be called context types, and 
denoted by u,v,w. 

4.1. Equivalence with regular languages. In this section we show that forest algebras 
provide an equivalent description of regular tree languages. Although this has already been 
shown in [6j, we present the proof here for two reasons. First, our definition is slightly 
different from the one in [6j, where a neutral element was required in H. Second, the notion 
of semigroup automaton used in the equivalence will be used later on in the paper. 
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The point of forest algebras is to recognize forest languages. Let L be a set of forests 
over labels {A, B) and let (-fT, V) be a finite forest algebra. We say a morphism 

a:{A,B)''^{H,V) 

recognizes a forest language L if membership t L depends only on the value a{t). In 
this case, we also say that the algebra {H, V) recognizes the language L. Note that this 
definition is for languages of forests, and not languages of trees, as in the logic EF + F~^. 
We will deal with this discrepancy in Section [5j 

Below we show that forest algebras recognize exactly the regular forest languages. What 
is a regular forest language? The definition used here, of a semigroup automaton, is chosen 
so that the translation to forest algebra is easiest. A semigroup automaton is a type of 
bottom-up finite automaton that can be used to recognize tree and forest languages. Let 
{A, B) be an alphabet. A semigroup automaton A over {A, B) is defined by a finite semi- 
group if, whose operation is denoted additively by -|-, along with two mappings (which 
describe the initial states and transitions, respectively): 

(5a:A^H 13b: B^H" 

The purpose of the automaton is to uniquely associate a type P{t) G H to every forest t. 
This is done using the following rules: 

/3(a) = /3a (a) 
/3(si + • • • + s„) = (3{si) + ■■■ + /3(s„) 

(3ibt) = PBmm) ■ 

Recall that in the last line above, bt is a tree that has b in the root and the forest t below. 

An automaton recognizes a forest language L if membership t £ L depends only on the 
value p. In other words, one can choose a set of accepting elements F <^ H such that a 
forest t belongs to L if and only the value /3(t) belongs to F. The definition can be modified 
for recognizing tree languages by requiring the equivalence t £ L <^ /3(i) G -F to hold only 
for trees. Note that even when recognizing a tree language, a semigroup automaton is still 
obliged to assign a value from H to every forest. 

It is not difficult to show that this definition is equivalent to other existing automata 
models for unranked trees, although there may be an exponential blowup when translating 
to semigroup automata. 

Theorem 4.1. A forest language is regular if and only if it is recognized by a finite forest 
algebra. 

Proof. Once we have a semigroup automaton, we can extend the mapping /3 so that contexts 
also get values, namely values in . A context p is assigned the following mapping 
Pip) G H^: 

h ^ Pipt) , 

where t is some forest with /3(t) = h (the choice of t does not change this value). It is easy 
to see that the mapping /3 (when seen as a mapping on both forests and contexts) is a forest 
algebra morphism 

p : iA,B)^ ^ {H,H^) . 
This shows the harder direction in the proof of Theorem 14.11 The other direction, from a 
forest algebra to a semigroup automaton, is immediate. □ 
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4.2. Syntactic algebra. The syntactic forest algebra of a forest language L is a canonical 
forest algebra that recognizes the language. It is defined using the following Myhill-Nerode 

cqTiivalcncc over forests and contexts. Two forests s, t are considered equivalent if for every 
context p, either both or neither ps nor pt belongs to L. Two contexts p, q are considered 
equivalent if for every forest t, the forests pt and qt are equivalent in the above sense. 

It turns out that the above defined equivalences arc a congruence with respect to all 
operations in a forest algebra; therefore a quotient forest algebra can be defined, where 
elements of H are equivalence classes of forests, and elements of V are equivalence classes of 
contexts. This quotient forest algebra is called the syntactic forest algebra of L. The syn- 
tactic morphism is the morphism that assigns to each forest (resp. context) its equivalence 
class. The syntactic morphism recognizes L, furthermore it is optimal in the sense that the 
syntactic morphism factors through any morphism recognizing L, i.e. if is a morphism 
recognizing L, and a is the syntactic morphism of L, then there is a (unique) morphism 
7 with a = 7 o /3. In particular, the syntactic forest algebra is a morphic image of any 
forest algebra recognizing L, and a language has a finite syntactic algebra if and only if it 
is regular. 

4.3. Green's relations for trees. Fix a forest algebra (iJ, F). In this section we introduce 
two preorders on V and H that will be used in the paper. 

We say that context type v E V is reachable from a context type w E: V iiv = wu holds 
for some context type u (zV. A context component is a maximal set of mutually reachable 
context types. Stated differently, two context types v,w are in the same context component 
if the ideals vV and wV are equal. Since reachability is transitive and reflexive, it induces 
an order (not necessarily linear) on context components. 

We say a forest type g E H is reachable from a forest type h & H if g = uh holds 
for some context type u & V. A forest component is a maximal set of mutually reachable 
forests. Stated differently, two forest types g, h are in the same forest component if the ideals 
Vg and Vh are equal. As for context types, forest components are ordered by reachability. 
Note that g + h is reachable from h, since we can take the context type u to he g + d. 

These two preorders are related to Green's relations used in semigroup theory. Actually, 
reachability on contexts simply is the 7?.-order on the semigroup V. The reachability relation 
on H is not one of Green's relations, since its definition involves the two sorts H and V in 
the forest algebra. 

5. Tree-Definable vs Forest-Definable 

A tree language L is tree- definable if there is a formula of EF + that is true exactly 
(in the root of) trees in L. In this paper, it will sometimes be convenient to talk about 
EF + F~^ formulas defining properties of forests (and not only trees). We say a forest 
language L is forest- definable if L is a boolean combination of languages of the form "some 
tree in the forest satisfies (^", with (p a formula of EF + F~^. Such a boolean combination 
will be called a forest formula. For instance, the following property of a forest + • • • + i„ 
is forest-definable: all trees ti, . . . ,tn contain a leaf with label a, and at least one of these 
trees has root label b. Any nonempty tree language violates the following property, which 
is true for forest-definable languages: 

t + teL iff teL , 
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for the simple reason that t + t is not a tree. Therefore no nonempty tree language is forest- 
definable. For the same reason, no nonempty forest-definable language is tree-definable. 

In this paper, we will present a dccidablc characterization for forcst-dcfinable languages. 
Thanks to the following result, this will also give us a decidable characterization of tree- 
definable languages. 

Proposition 5.1. Let L be a tree language over {A,B). The following conditions are 
equivalent: 

• L is tree-definable. 

• For each inner node label b G B, the forest language {t : 6t G L} is forest- definable. 

Proof. We begin by showing that the first property implies the second. Assume then that L 
is tree-definable, and fix some b £ B. We need to show that the forest language {t : bt € L} 
is forest definable. 

Let P be the set of contexts of the form p = &(□ -|- 1), where t is a forest. Consider the 
following equivalence relation on trees: 

s t iff ps & L 4^ pt L holds for all p G P . 

This equivalence relation has only finitely many classes, since it is coarser than the Myhill- 
Nerode equivalence relation used in the definition of syntactic algebra. Note that we would 
get the same equivalence relation by also considering contexts of the form p = + n\ + t), 
since EF + F~^ is invariant under reordering siblings. Furthermore, each of these equivalence 
classes is tree-definable, thanks to the following fact: if p is a context and L a tree-definable 
language then the set of trees t with pt e L is tree-definable. The standard proof of this 
fact is omitted here. For any forest t = ti + • • • + t„, membership bt £ L only depends on 
the equivalence classes under ~ of the trees ti, . . . ,tn that the constitute the forest t. Since 
EF-|- F"^ formulas are invariant under duplicating and reordering sibling trees, it is only the 
set of equivalence classes that counts, which can be described by a boolean combination of 
languages of the form required in forcst-dcfinable languages. 

We now do the bottom-up implication. It suffices to show that if a forest language L 
is forest-definable, then for any inner node label b E B, the tree language {bt : t E L} is 
tree-definable. The key step is that if a tree language K is tree-definable, then the following 
tree language: 

XK = {b{ti + --- + tn) ■.beB,3i.tieK} 

is also tree-definable. Once we demonstrate how to write a formula for XK, the formula 
tree-defining bL can be obtained from the formula forest-defining L. 

Note that definability of the language XK does not mean we can add the child operator 
to the logic. This is because XK uses the child only at a fixed depth. For instance, the 
property "some node at depth 4 has the same label as its parent" is tree-definable, contrary 
to the property "some node has the same label as its parent". 

The formula for XK can be obtained from the antichain composition principle, but 
we do a direct construction here. Let (p be the formula defining K. We define <f to be 
the formula obtained from ip by replacing every subformula ^ by ^ A F~^true. This way, 
quantification in (p is relativized to non-root nodes. Finally, the formula for XK is 

Ef{{F-^ true) A {-^F~^F~^true) A ip) . 



TWO-WAY UNARY TEMPORAL LOGIC OVER TREES 



11 



The above formula nondeterministically picks a successor x of the root, and then tests if 
if holds in x. Since (p is relativized to non-root nodes, evaluation of (f will never leave the 
subtree of x. □ 



6. The identities and the main result 

In this section we state our main result, the decidable characterization of the logic 
EF + F-^ 

The characterization uses a relation H over contexts in a forest algebra. The idea is 
that u -\ w holds if the context u can be obtained from the context w by removing forests 
that are siblings of the main path (recall that the main path contains ancestors of the hole). 
Let {H, V) be a forest algebra. For UjW £ V, we write u -\ w if u,w can be decomposed as 

U = VqVi ■■■Vn W = Voihi + Vi) ■ ■ ■ (K + Vn) 

for some vq, . . . ,Vn € V and hi, . . . ,hn € H. The reason why we have vq above, and not 
^o+^^O) is that a context type can be empty, but there is no empty forest type. The following 
lemma shows that the relation H can be calculated in polynomial time using a least fixpoint 
algorithm: 

Lemma 6.1. The relation H is the least relation R Q V x V such that: 
iv,v), {v,v + h), {v,h + v) e R forveV,heH 
{v, v'), {w, w') G i? {vw, v'w') G R for v, v' , w,w' £ V . 

Proof. The implication from {v,w) R to v -\ w is proved by induction on the number 
of steps in the derivation. The converse implication is proved by induction on n in the 
definition of H. □ 

The relation H is transitive in some forest algebras, including all free forest algebras. 
However, in general it need not be transitive, as illustrated by the following example. Let 
the leaf alphabet A be {01,02} and let the inner node alphabet B be {b}. Consider the 
forest language L: "the forest does not contain both labels oi and 02 at the same time, and 
every node with label b has a sibling with label oi or 02". Let a be the syntactic morphism 
of this language. Consider the following four contexts: 




Clearly we have a{pi) H a{p2) and a[qi) H a{q2). We claim that 0(^2) = ot{qi). Indeed, 
both contexts are "error" contexts, i.e. for any context r and forest t we have rp2t, rqit ^ L. 
Therefore, if H were a transitive relation, we would have a{pi) H a((?2)- This, however, 
cannot hold, since otherwise we could construct a tree in L with both oi and 02 labels. 
We are now ready to state the main theorem of this paper: 

Theorem 6.2. A language is forest- definable in EF + F""'^ if and only if its syntactic algebra 
satisfies the following identities: 

h + h = h g + h = h + g (6.1) 
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(vw)'^ = {vw)'^w{vw)'^ . (6.2) 

(uiWi)"^ {U2W2)'^ = {uiWiYuiW2{u2W2Y ^/ ^1 ^ U2,Wi H W2 ■ (6.3) 

In the identities above, all variables are quantified universally. The identities in (16. 1|) say 
that children can be duplicated and reordered. This corresponds to bisimulation invariance 
in the following way: a forest language is bisimulation invariant if and only if its syntactic 
forest algebra satisfies (16. ip . The identity (j6.2p says that the vertical monoid belongs to 
the variety DA (although the commonly used identity is different). Only the last identity 
is new. 

The exponent u; in properties (16. 2p and (16. 3p stands for "for almost all n" . In particular, 
identity (|6.2p should be read as: 

3m\/n > m (vw)"' = {vw)"'w{v'w)^ . 

Usually in semigroup theory, uj stands for "least idempotent power" , but the above definition 
is equivalent for aperiodic monoids, which is the case here, thanks to (j6.2p . 

An important corollary of the above theorem is that definability in EF+F~^ is decidable: 

Corollary 6.3. It is decidable if a forest (resp. tree) language is forest- definable (resp. tree- 
definable) in EF + F~^. The algorithm runs in polynomial time if the input is given as a 
forest algebra. 

Proof. To determine if a language is tree-definable, we calculate the languages {t : bt & L} 
and reduce to the characterization of forest-definable language thanks to Proposition 15.11 
Therefore, we focus on deciding if a language is forest-definable. 

We begin by finding the syntactic forest algebra. The syntactic forest algebra can 
be effectively calculated based on any representation of the tree language, be it a tree 
automaton, or a formula of some rich logic, such as MSO. In general, the syntactic forest 
algebra can be exponentially larger than a nondeterministic tree automaton, not to mention 
a formula of MSO. 

Once the syntactic forest algebra has been calculated, the properties (16. ip . (16. 2p and (16. Sp 
can be verified in polynomial time (with respect to the algebra). The relation H over V can 
be computed in polynomial time thanks to Lemma l6.1[ The exponent uj is not a problem. 
Indeed, a consequence of (j6.2p is that V is aperiodic, i.e. the identity v'^ = v^v holds for all 
context types v. In particular, it is enough to test for a; = □ 

The rest of this paper is devoted to showing Theorem 16.21 The "only if" implication 
in the above theorem is proved in Section [7] using a simple induction on formula size. The 
difficult part is the proof of the "if" implication, which is found in Section [8l 

In the following fact, we show that property (j6.3p in Theorem 16.21 is not redundant. In 
a similar way one can prove that neither (j6.ip nor (j6.2p is redundant. 

Lemma 6.4. There exists a forest algebra satisfying properties i6.1\) and \6.2) but not \6.3^ . 

Proof. Let the leaf alphabet A be {01,02} and let the inner node alphabet B be {6}. 
Consider the following language: "if a node has a child with label ai, then it has an 
ancestor with a child with label 02" . The syntactic forest algebra of this language satisfies 
properties (j6.ip and (j6.2p : but it does not satisfy (j6.3p . since for all n G N we have 

(66)"((6 + a2)(6 + ai))"a2 G L (66)"6(6 + ai)((6 + a2)(6 + ai))"a2 L . □ 
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7. Correctness 

In this section we show that any language forest-definable in EF + F^^ satisfies the 
identities from Theorem 16. 2i For each of these identities we show that any formula of 
EF + F~^ must, informally speaking, confuse the two trees described by the opposing sides 
of the identity. To show this confusion, we use an Ehrenfeucht-Prai'sse game. The plan 
of this section is as follows. First, in Section 17.11 we define the Ehrenfeucht-Frai'sse that 
characterizes EF + F^^. Next, in Section 17.21 we use the game to show that languages 
defined in EF + F~^ are closed under morphic preimages. Finally, in Section [7.31 we show 
that any language forest-definable in EF -|- F~^ satisfies the identities from Theorem 16.21 

7.1. Ehrenfeucht-Fraisse Game. In this section, we define an Ehrenfeucht-Frai'sse game 
that characterizes the logic EF -|- F~^. 

The game is played on two forests sq and si, with two distinguished nodes, xq in sq 
and xi in si. A configuration of the game is therefore a four-tuple (xq, xi, sq, si). Finally, 
the game has a parameter n G N, which is called the number of rounds. The game is played 
by two players, Duplicator and Spoiler. The idea is that Duplicator claims that the same 
formulas of size at most n hold in xq and xi. 

The game is played as follows. Assume that there are n > rounds left. If the labels 
of Xq, Xi are different, then Spoiler wins the game immediately, and no further rounds are 
played. If the labels are the same, and n = 0, then Duplicator wins the game, and no 
further rounds are played. Finally, if the labels are the same and n > 0, a new round is 
played as follows. 

First, Spoiler chooses one of the two nodes xo,xi, i.e. he chooses an index i G {0,1}. 
The idea is that Spoiler thinks that the node Xi has some property that the other node xi-i 
does not have. He then chooses to make either a descendant move (in this case. Spoiler 
thinks that Xi has a descendant unlike all descendants of xi-i) or an ancestor move (Spoiler 
thinks that Xj has an ancestor unlike all ancestors of . If Spoiler chooses a descendant 
(respectively, ancestor) move, then he must choose a proper descendant (respectively, proper 
ancestor) yi of Xj in the forest Sj. To this. Duplicator must respond by choosing a proper 
descendant (respectively, proper ancestor) of xi_i in the other forest si_j. The idea 
is that Duplicator thinks that yi-i is similar to yi, at least as far as the remaining n — 1 
rounds are concerned. Formally, the new configuration becomes {yo, yi, sO) ■si) and the game 
continues with n — 1 rounds left. 

We also define how the n-round game is played on two forests sq, si in case when the 
nodes xo,xi are not specified. In this case, there is a special introductory round, where 
Spoiler chooses i € {0, 1} and a root node Xj in s^; Duplicator responds with a root node 
xi-i in the other forest. Then the standard n-round game continues from this configuration. 

Proposition 7.1. A forest language is forest- definable in EF-|-F~^ if and only if for some n, 
Spoiler wins the n-round game for any pair of forests sq G L and si L. 

Proof. The proof is standard, and omitted here. The idea is that n is the nesting depth of 
the formulas used to forest-define L. The nesting depth counts the maximal nesting of EF 
and F~^ in a formula, while boolean operations are for free. □ 
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7.2. Morphic images. In this section, we show that languages forest-definable in EF + F^ 
are closed under morphic preimages. Actually, we show a slightly more general result. The 
more general setting will be used in Section [9l where we show that our characterization also 
works for a different model of forest algebra, where empty forests are allowed. 

We first describe the more general setting. The generalization is twofold. First, we 
allow empty forests. Second^^j we consider forests over a single alphabet (unlike the two- 
sorted alphabet A, B considered before, with A allowed only in leaves and B allowed only 
in inner nodes). The new type of forests will be called one-sorted forests, to distinguish 
them from the two-sorted forests considered before. The one-sorted forests are more general 
in the following sense: the two-sorted forests over an alphabet {A, B) are a subset of the 
one-sorted forests over the alphabet AVJ B. Of course, the difference is not that big: the 
one-sorted forests over A are the two-sorted forests over {A, A), plus the empty forest.We 
also have an analogous concept of one-sorted contexts. A one-sorted morphism, with source 
alphabet A and target alphabet B is given by a function that assigns to each letter of j4 a 
one-sorted context, possibly empty, over B. A one-sorted morphism uniquely extends to 
one-sorted forests and one-sorted contexts. To avoid confusion, in this section we use the 
name two-sorted morphism for the morphisms introduced previously in the paper. 

Theorem 7.2. Let a be a one-sorted morphism. If a forest language L over the target 
alphabet B is forest- definable in EF + F~"^, then so is its inverse image a~^{L). 

The version of this theorem for two-sorted morphisms is a special case of the one-sorted 
version, since every for two-sorted morphism there is a one-sorted morphism that gives the 
same results over all legal two-sorted forests. 

To show this theorem, we will use the Ehrenfeucht-Fraisse game. We fix the forest- 
language L and the (one-sorted) morphism a from the theorem for the rest of this section. 
Let n be the number of rounds obtained by applying Proposition 17. II to the forest L in the 
statement of the theorem. By invoking Proposition 17.11 a second time, to establish that 
the inverse image a~^{L) is forest-definable in EF + F~^, it suffices to show that Spoiler 
can win the n-round game over any two preimages, one taken from the preimage a~^{L), 
and the other taken from its complement. The proof will be by showing how a strategy of 
Duplicator over the preimage can be lifted to a strategy over the image, as stated in the 
following proposition. 

Proposition 7.3. If Duplicator wins the n-round game over sqjSi, then Duplicator also 
wins the n-round game over a{so),a{si). 

To prove this transfer of strategies, we will be switching back and forth between the 
Ehrenfeucht-Fraisse games on so,si and on a{sQ),a{si). To avoid confusion, we use the 
name preimage game for the former and we use the name image game for the latter. We 
will be comparing configurations of the two games in the following way. Every node x in a 
morphic image a{s) can be uniquely identified by two pieces of information: its preimage 
X, which is a node in the preimage forest s, and its offset, which is a node of the context 
assigned by a to the label in x. These concepts are illustrated below, in an example where 
both the source and target alphabets are {a, b}, and the one-sorted morphism is defined by 
a{a) = a(n + b) and a{b) = □. 



* It turns out that in forest algebra, the first generalization entails the second. 
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a.(s) 




offset o£x 



offset ofy 



Note that some nodes in the preimage forest s are not the preimage of any node in a{s), 
these are the nodes whose labels are mapped to an empty context by a. 

Armed with the definitions of offset and preimage, we now prove the strategy transfer 
from Proposition 17.31 We only give the main invariant, which is described below. The 
missing part of the proof, for the introductory round of the game where the root nodes are 
chosen, is done in a similar way. 

Lemma 7.4. Let m <n. Let xq^xi be nodes with the same offset such that xqj^i have the 
same label. If Duplicator can win the n-round preimage game in configuration (xq,xi, sq, si), 
then he can also win the m-round image game in configuration {xo,xi,a{so),a{si)). 

Proof. The proof is by induction on n. Consider first the case of n = 0. By assumption on 
the preimage game, the nodes xq and xi have the same labels in sq, si. Since the two nodes 
xq, xi have the same offsets, they must also have the same labels in the images a{so),a{si), 
and therefore Duplicator wins. 

Consider now the induction step. We only do the case when Spoiler chooses a descen- 
dant move, the ancestor move is done the same way. Assume then that Spoiler chooses Xi 
and indicates a proper descendant yi of Xi in a(sj). How should Duplicator respond? There 
are two possible cases: 

• The preimage yi is a proper descendant of Xi. We now go to the preimage game, and 
make Spoiler play a descendant move where he chooses y^. By assumption on Duplicator 
winning the preimage game, there is a proper descendant of call it yi-i, such that 
Duplicator wins the (n — l)-round preimage game from configuration {yo,yi, sq, si). In 
particular, the nodes yo,yi have the same labels in the preimage, and therefore the same 
possible offsets in the image. Therefore, there exists a node yi-i in a(si_j) such that its 
preimage is yi-i, and this node can be chosen to have the same offset as yi. We now use 
the induction assumption to show that Duplicator wins the rest of the image game from 
configuration (yo, J/i, "(so), "(si))- 

• If the preimage yi is not a proper descendant of Xj, then yi = Xi and the only difference 
between and Xi is in the offset. Duplicator's response is to choose in the forest si_j 
a node yi-i that has the same offset as j/j, and such that yi-i = xi-i. We then use the 
induction assumption to show that Duplicator wins the rest of the image game. □ 

7.3. Correctness of the identities. We are now ready to show the easier implication 
in Theorem 16.21 namely that the syntactic forest algebra of a language forest-definable in 
EF + satisfies the three identities. Validity of (16. ip can easily be shown. We omit the 
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proof of (16. 2p for two reasons: first, it is the same as in the word case, see e.g. [13] ; and 
second, it fohows along similar lines as the proof of (j6.3p . 

The rest of this section is devoted to showing the validity of identity (j6.3p . Let L be 
a forest language forest-definable in EF + F~^. We need to show that the syntactic algebra 
of L satisfies identity (j6.3p . Recall that elements of the syntactic algebra are equivalence 
classes of the Myhill-Nerode equivalence relation. Therefore, in order to show the validity 
of (j6.3p . we have to show that for any formula (/9 of EF + F~^, for all contexts pi H p2 and 
Qi ^ 92) every context p and every nonempty forest t, for almost all n G N the formula ip is 
true in some tree of either both or neither of the forests 

So=p{jPiqiT{p2q2Tt si=p{piqiYpiq2{p2q2Tt . (7.1) 

We will use the Ehrenfeucht-Frai'sse game, and show that Duplicator can win the n-round 
game over the above two forests. To keep notation simple, we assume the following simpli- 
fying assumptions are met. 

• The context p is a single node h (in particular, sq and si are trees). 

• The forest t is a single node a. 

• The contexts pi,P2,qi, 92 are 

Pi = bi---bk P2 = hiai + □) • --bkiak + □) 

qi = bk+i ■■■bm q2 = bk+i{ak+i + □)••• 6m(am + □) 
for some k < m and bi, . . . ,bm & B, ai, . . . ,am & A. 

• The labels a, ai, . . . , am, b,bi, . . . ,bm and a are all distinct. 

The trees sq and si are shown in Figured! Why can we make these simplifying assumptions? 
The reason is that the general case follows from this special case by way of homomorphic 
images. More specifically, consider the two forests sq, si in the general case, as given in ()7.ip . 
We want to show that Duplicator wins the n-round game over these two forests. The key 
observation is that any two forests so,si as in (j7.ip can obtained as homomorphic images 
So = a(to) and si = a{ti) from trees tQ,ti that satisfy the simplifying assumption, for some 
(two-sorted) morphism a. As long as we know how Duplicator can win the game over the 
simpler trees to,ti, we can use Proposition 17.31 to transfer this result to the forests so,si. 

We now proceed to describe a winning strategy for Duplicator over trees so,si that 
satisfy the simplifying assumptions. We use the term main path for the ancestors of the 
node o. The projection of a node onto the main path is its closest ancestor (not necessarily 
proper) that is on the main path. For a node in either sq or si, the ancestor block count 
(respectively, descendant block count) is the number of ancestors with label bm (respectively, 
descendants with hi) of the node's projection onto the main path. For m < n, we say that 
two nodes xo,xi in the trees sq, si are m-similar if their labels are the same and moreover 
one of the conditions in the following invariant holds: 

(1) The trees sq, si agree on nodes in the subtrees of yo,yi; or 

(2) The trees sq, si agree on nodes not in the subtrees of yo,yi; or 

(3) The ancestor and descendant block counts of xq,xi are both at least m. 

Lemma 7.5. Let m < n. If the nodes xo,xi are m-similar, then Duplicator wins the 
m-round game from configuration (xq, xi, sq, si). 

Proof. The proof is by induction on m. For the base case m = we use the assumption 
that the labels are the same. Consider now the induction step. We only do one case, when 
Spoiler chooses a descendant move to go from xi to a node x'l in the "new block" of si (the 
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Figure 1: The trees sq and si 



new block is the context piq2)- This Spoiler move means that xq, xi are m-similar for reason 
(2) or (3), since item (1) forbids a descendant of xi in the new block. What is Duplicator's 
response? Note that for all nodes in the new block, both the ancestor and descendant block 
counts are at least n > m — 1. Duplicator goes to any node x'q in the tree so where the 
ancestor and descendant block counts are both at least m — 1. This must be possible, since 
either one of items (2) or (3) of the invariant was true for xo,xi. The rest of the game is 
played according the induction assumption, since x'q and x[ are (m — l)-similar. □ 

By taking m = n'm the above lemma, we get the desired result. This is because the two 
roots of SO) si have the same (empty) prefixes, thus they are m-similar, and must therefore 
satisfy the same formulas of size m = n. 
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8. Completeness 

This section is devoted to showing: 

Proposition 8.1. Any forest language recognized by a forest algebra satisfying (6.1\) . ^6. 
and ( fg. 3\) can be forest-defined. 

The above statement immediately impUes the more difficult "if" part of Theorem 16.21 
Indeed, if L is recognized by an algebra satisfying (16. ip . (16. 2p and (16. 3p . then its syntactic 
algebra satisfies these identities. This is because the syntactic algebra is a morphic image 
of any algebra recognizing the language, and identities are preserved by morphic images. 

Let X C be a set of forest types. We say a forest t is X -trimmed if the only subtrees 
of t that have a type in X are leaves. We say a tree language L is tree-definable modulo X 
if there is a formula if such that 

t satisfies (p iff t (z L 

holds for all X-trimmed trees (for other trees, (p may disagree with L). In a similar fashion, 
we define a forest language that is forest-definable modulo X. 

Instead of Proposition 18.11 we show the slightly more general result below, which con- 
tains the induction parameters that appear in the proof. 

Proposition 8.2. Let a : {A,B)'^ {H,V) be a morphism, with {H,V) satisfying identi- 
ties i6.1]) . i6.2\) and i6. Let X O H be a set of forest types, and let v ^ V be a context 
type. For each forest type h G H the following forest language is forest- definable modulo X: 

{t : f (a(t)) = h} . (8.1) 

For the rest of Section El we fix a : {A, B)'^ {H, V), h e H , v £ V and X Q H from 
Proposition 18.21 Clearly Proposition 18.11 follows from the above result, taking X = ij), v to 
be the empty context type □, and doing a disjunction over all forest types h £ a{L). The 
rest of Section [8] is devoted to a proof of Proposition 18.21 The proof is by induction on four 
parameters: 

(1) The size of if, i.e. the number of all forest types. 

(2) The size oi H\X, i.e. the number of forest types that can be found outside leaves. 

(3) The size of vV ^ i.e. the number of context types reachable from v. 

(4) The size of B, i.e. the number of inner node labels. 

The order of these parameters is important: first we try to minimize H, then the other three 
parameters (the order for the other three is not important). Note that the last parameter 
depends on the alphabet B, and the notion "modulo X" depends on the morphism. 

We say a morphism a into {H, V) is leaf saturated if for every h £ H, there is a 
representative leaf label a whose type a(a) is h. In the rest of this section, we will only 
consider such morphisms. By adding leaf labels, any morphism can be extended to one that 
is leaf saturated, without affecting the target forest algebra. 

We begin by outlining our proof strategy for Proposition 18.21 We will consider three 
possible cases. First, in Section [8Tl we see what happens when some inner node label b € B 
has the property that v cannot be reached from va(b). Then, in Section \8.2\ we see what 
happens if H\X intersects more than one forest component, i.e. contains at least two forest 
types that are not mutually reachable. Finally, in Section 18. 3^ we show that if neither of 
the above holds, then the formula ip in Proposition 18.21 can basically be replaced by either 
"true" or "false". 
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8.1. For some inner node label b & B, v is not reachable from va{b). We begin 
with this case, which is the easiest of the three. The basic idea is that we cut the forest 
into two parts, by looking at the first occurrence of b on each path, beginning at the root. 
Since after reading the label 6, the context type v is no longer reachable, we can use the 
induction assumption to calculate the subtree below each such first b. These subtrees can 
then be squashed into single leafs using the antichain composition principle, and therefore 
the induction assumption can be used on a smaller alphabet of inner node labels, which 
now no longer contains b. 

We say that two forest types h,g & H are v-equivalent if vuh = vug holds whenever v 
is not reachable from vu. 

Lemma 8.3. For each h, the set of forests whose type is v-equivalent to h is forest- definable 
modulo X . 

Proof. Fix some context type u such that v is not reachable from vu. By induction 
assumption — the third parameter is decreased — the set of forests s satisfying vua{s) = vuh 
is forest-definable modulo X. The set in the statement of the lemma is the intersection, 
over u, of all these sets. □ 

Lemma 8.4. Ifv,w,vu G V are in the same context component, then so is wu. 

Proof. By assumption there must be context types v',w' with vuw' = w and wv' = v. But 
then we have wv'uw' = w. In particular, w{v'uw')^v' = v. Using identity (16. 2p . we get 

v = w{v'uw'Yv' = w{v'uw'Yuw' {v'uw'Yv' = wuw' {v'uw'Yv' , 

which shows v can be reached from vou. □ 

Let 7i, . . . ,7n be all the equivalence classes of v-equi valence. For each such class 7i, let 
Li be the set of trees {bt : a{t) € 7^}. Thanks to Lemma |8.3| each set Li is tree-definable. 
For any i = 1, ... ,n, let hi be an arbitrarily chosen forest type in the class 7^, and let Oj 
be a leaf label whose type is a{b)hi. The label aj exists by assumption on leaf saturation. 
Note that Oj may have a difi'erent type than some of the trees in Lj, since hi need not be 
the only forest type in ji. However, we will show that no information is lost by squashing 
subtree in Li into a single leaf with label Oj, at least as long as the resulting forest is going 
to be an argument of v. More formally, we show: 

Lemma 8.5. Let (p = b A -iF^^6, i.e. "a b without b ancestors". For any forest t we have 

va{t) = va{t[{Li,ip) ^ ai, . . . , (L„, ip) an]) . 

Before we show this lemma, we show how it concludes the case considered in this section. 
Recall that we want to show that the following language is forest-definable modulo X: 

L = {t : va{t) = h} . 

By Lemma 18.51 this is the same language as 

{t : t[{Li,ip) ^ ai, . . . , (L„, v?) a„] G L} . 

Since the substitution operation removes all letters b from the forest, we get 

L = {t: t[{Li,ip) ^ ai, . . . , (L„, 99) ^ a„] G K} , 

where K is the set of trees in L that do not use the letter b. To K we can apply the induction 
assumption on a smaller alphabet, and then use the antichain composition principle to 
transfer definability from K to L. 
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We now resume with the proof of Lemma 18. 51 

Proof. Note first that the tree on the right hand side of the equation is weU defined, since the 
languages Lj are disjoint, and ip is an antichain formula. The proof is by induction on the 
number of b nodes in the forest t. The induction base, where there are no 6's, is immediate 
since the substitution on the right hand side does not change the forest. Otherwise, let t 
be of the form pbs, with the context p not containing any 6's on the main path, and let Lj 
be such that bs (z Li. By induction assumption, we have 

a{pai[{Li,ip) ^ ai, . . . , (L„, if) ^ a„]) = a{pai) . 

By definition of the substitution we have 

^ ai, . . . , (L„, (f) an] = pai[{Li,(f) ^ oi, . . . , (L„, (f) a„] , 

it therefore remains to show that va{pbai) = va{pbs). 

First, we claim that v is not reachable from va{pb). Indeed, if v is not reachable from 
va{p) then we are done. Otherwise, v and va{p) are in the same context component. If this 
context component would also contain va{pb), then by Lemma 18.41 it would also contain 
va{b), a contradiction with the assumption on b. 

Recall now the forest type hi that represented the equivalence class ji 3 a{s). By 
assumption on a{s) and hi being v-equivalent, we get 

va{pbs) = va{pb)a{s) = va{pb)hi = va{p)a{b)hi = va{p)a{ai) = va{pai) . □ 

8.2. There is more than one forest component in H\X. We now turn to the second 
case in the proof of Proposition 18.21 Let G C be a forest component not included in X. 
We pick G so that no forest type in G can be reached from a forest type outside X U G. 
Intuitively speaking, forest types from G are close to the leaves. The essential idea in this 
section is that we will add G to X, by squashing each subtree of type 5 to a single leaf with 
the g written in its label. This is done by applying the antichain composition. 

Let W Q V he the set of context types that preserve G, i.e. context types w such 
g is reachable from wg for some g £ G. The following lemma, proved the same way as 
Lemma 18. 4^ shows that "some" in the above definition can be replaced by "all" . 

Lemma 8.6. If g,h,vg are in the same forest component, then so is vh. 

Let F C H he the set of those forest types / from which a forest type in G can be 
reached. In particular, we have 

GCFQH. 

Note that all forest types in F \ X are from G by choice of G. Furthermore, the inclusion 
F H is proper, since H \ X contains more than one forest component by assumption. 
The inclusion G ^ F may also be proper, however all forest types in the difference F\G 
are from X. 

We say / G is a bad brother if for all g £ G, we have f + g G, i.e. g is not reachable 
from f + g. Likewise, we say f £ H is a good brother if for all g £ G, we have f + g € G, 
i.e. g is reachable from f + g. Note that by definition of F, all good brothers are in F. 
Clearly / is a bad brother if and only if the context type / + □ is outside W . Therefore by 
Lemma l8.6( every forest type in H is either a good brother or a bad brother. In particular, 
all forest types in G are good brothers, since they cannot be bad brothers hy g + g = g. 
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Furthermore, since W is closed under context composition, good brothers are closed under 
forest concatenation, i.e. form a subsemigroup of H. 

We fix the sets G, F and W for the rest of Section 18.21 

A twig is a tree of depth exactly two, i.e. a root and some leaves. A twig node is a node 
whose subtree is a twig. 

Lemma 8.7. There is a formula tp such that in any X -trimmed tree, holds in nodes with 
a subtree of type in G. 

Proof. Let t be an X-trimmed tree, and x a node in this tree. If the node is a leaf, then 
the type of its subtree can be read from the label. Otherwise, the type of the subtree 
must be either in G or outside by assumption on the tree being X-trimmed. We claim 
that the following condition is necessary and sufficient for the subtree of x to have a type 
outside F, and can furthermore be tested by a formula of EF + F^^. The condition is that 
some descendant y of x, not necessarily proper, is either 

(1) A leaf or twig node with a type outside F; or 

(2) A non-twig inner node with a label b (z B whose type a{b) is outside W; or 

(3) An inner node whose brother has a leaf label a whose type a{a) is a bad brother. 

We begin by showing that these conditions can be tested by an EF + F~^ formula. 
Testing for 1) is simple. Using EF, we search for a candidate y for the node. If y is a 
leaf, we just test its label. Otherwise, we test if ?/ is a twig node (no path of length at 
least two). Then we read the label of y and the set of labels in descendants of y, which 
uniquely determine the type of the subtree of y, thanks to idempotency and commutativity, 
i.e. identities (j6.ip . Condition 2 is tested in a similar way. For condition 3 we use EF to go 
into a leaf y with a label a whose type is a bad brother. We then test if y has a sibling that 
is an inner node (all ancestors of y have an inner node descendant). 

We now show that these conditions are sufficient. The first one is clearly sufficient. For 
the other two, note that every inner node has a subtree with type outside X by assumption 
on the tree being X-trimmed. This type must then be either in G ^ F \ X or outside F. 
For the second condition, let bs be the subtree of a non-twig inner node, with s a forest. 
Since s has depth at least two, its type must be outside X, and therefore either outside F, 
or in G ^ F\X. In either case, the type of bs is outside F. The last condition is shown in 
a similar way. 

It remains to show that the conditions are necessary. Indeed, assume that the subtree 
of X has a type outside G. Let s be a minimal subtree below x that has a type outside F. 
If s is a leaf or a twig, then item 1 must hold. Otherwise s is of the form b{si + • • • s„), 
for some label b & B and trees si,. . . ,Sn, with at least one tree Sj not being a leaf. By 
assumption on the tree being X-trimmed, the type of Si is outside X. Since F \ X C G, 
the type of this Sj is in G. If all the types of Sj, for j ^ i, are good brothers, then the 
type of si + • • • + s„ must belong to G by closure of good brothers under composition, and 
therefore case 2 must hold. Finally, we consider the case when the type of some tree sj is a 
bad brother. Since all forest types from G are good brothers, the type of sj is in C X. 

Since the tree is X-trimmed, sj is a single leaf, and thus 3 holds. □ 

Lemma 8.8. For each g (z G, the set of trees with type g is tree- definable modulo X. 

The general idea is that (G, W) is a (smaller) forest algebra, and therefore the induction 
assumption can be applied to languages recognized by {G,W). However, thanks to bad 
brothers and such, (G, W) does not recognize the language in the lemma. Before we solve 
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this problem, we show how Lemmas 18.71 and 18.81 along with the antichain composition 
principle conclude the case considered in this section. The idea is that we add all forest 
types from G to X. 

Let h, V be as in the statement of Proposition 18. 2i We need to show that the language 

L = {t : v{a{t)) = h} 

is forest-definable modulo X. By induction assumption, we know that this language is 
forest-definable modulo X Li G. In other words, there is some forest-definable set of forests 
K that agrees with L over {X U G')-trimmed forests. To describe L modulo X, we will use 
the antichain composition principle. 

Let ip be the formula from Lemma 18.71 Let 

f = il^ A -^f~^\l) . 

This formula holds in a node whose subtree has a type in G, and the node is closest to 
the root for this property. Thanks to the last clause, (p is an antichain formula. Let 
G = {gi, . . . ,gn}- By assumption that a is leaf saturated, for each gi there is a leaf label 
Ui £ A with a{ai) = gi. For each gi^ let Li be the set of trees with type gi. Thanks to 
Lemma [531 each tree language Lj is tree-definable modulo X. 

It is easy to see that squashing a subtree with type gi into a single leaf with label a, 
does not change the type of the whole tree. More precisely, a forest t has the same value as 

i[{Li,ip) ^ ai, . . . , if) an] . 

Furthermore, the above forest is {X U G)-trimmed, at least as long as t was X-trimmed. It 
follows that over X-trimmed forests, L agrees with 

{t : t[{Li,'f) ^ ai, . . . , (L„, 99) a„] G K} , 

which is forest-definable thanks to the antichain composition principle. It now remains to 
show Lemma 18.81 which we do in the next section. 

8.2.1. Trees with type in G. Fix some forest type g £ G. Our goal is to show that the set 
of trees with type g is tree-definable modulo X. 

Lemma 8.9. Without lo.ss of generality, we may assume that all forest types in F are good 
brothers and all inner node labels b satisfy a(b) G W . 

Proof. Recall that all forest types from G are good brothers. In particular, all bad brothers 
in F are from X, and can therefore only appear in leaves, as long as we are working over 
X-trimmed forests. Let ^' C ^ be the set of leaf labels that are mapped by a to a good 
brother in F. Let B' Q B he the set of inner node labels b with q(6) G W. 
Let P be the restriction of a to this smaller alphabet: 

P:{A',B'f^{H,V) . 

Note that over X-trimmed forests, the only forest types from F in the image of /? are good 
brothers, and all inner node labels b satisfy a{b) G W. Assume now, that we have shown 
Lemma 18.81 for the morphism /?, i.e. the set K of trees that have type g under /? is tree- 
definable modulo X. We will use the antichain composition principle to extend this result 
to Q. The idea is that we squash twig nodes into leaves, thus eliminating labels outside 
A',B'. 
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Let ip he a formula that is true in twig nodes (the node is not a leaf, but all of its 
proper descendants are leaves); this is clearly an antichain formula. Let G = {gi, . . . ,gn}- 
By assumption that a is leaf saturated, for each there is a leaf label Oj G A with a(oj) = gi. 
For each gi, let Lj be the set of twig trees with value gi (under a). Each Li is tree-definable, 
since the type of a twig tree is determined by its root label and the set of its leaf labels 
by (j6.ip . It is easy to see that a tree t over {A, B) has the same type under a as the tree 

^ ai, . . . , {Ln, f) an] ■ 

Furthermore, if the type of t under a is g, then the latter forest belongs to the domain of 
P, since all nodes with a label outside A' or B' are covered by ip. Therefore, we can use the 
antichain composition principle to conclude that the forests with value g under a can be 
defined in EF + F-i. □ 

From now on, we use the assumptions stated in the previous lemma. Recall that good 
brothers are closed under concatenation, and therefore -F is a subsemigroup of H. This 
allows us to define a semigroup automaton A, whose semigroup is F. The input alphabet 
of this automaton is: 

• The inner node labels are B 

• The leaf labels are A' = {a £ A : a{a) G F}. 

For a G A', we define /3a(o) to be a{a). For 6 G i?, we would like the associated function 
Psib) to be a{b). Even though Lemma [8.91 guarantees that a{b) belongs to W, this context 
type cannot be used since it need not generate a function F ^ F. The reason is that a{b)h 
may be outside F for types h outside G. To solve this problem, we artificially redefine the 
function: 

a{b)h if a{b)h G F (H2) 
go otherwise. 
In the above, go is an arbitrarily chosen forest type from G. 

By the proof of Theorem 14. ![ this automaton induces a forest algebra morphism 

(3:{A',B'f^{F,F''). 

This morphism is not the same as a, due to the second clause in ()8.2p . However, it agrees 
with a over the forests that are relevant to Lemma [HIHl 
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Lemma 8.10. For any g ^G, and forest t, if a{t) = g then I3{t) = g. 

Proof. If t has a type in G under a, then all of its leaf labels belong to A' by definition of F. 
Therefore, t belongs to the domain of (5. The lemma is proved by induction on the size of t. 
If a(t) = g, then the "bad" second case in ()8.2p is never used while calculating [i{t). □ 

Lemma 8.11. The image of (3 satisfies identities \6. ^6. ^|) and \6. 

Proof. We only focus on identity (j6.3p . the others are easy to show. The key idea is that a 
and (3 only disagree in twig nodes, and these are not important for the identity ()6.3p . 
Let then pi H P2) 9i ^ 92 be contexts. We need to show that 

/?((P191)"(P292)") =/3((pift)>ig2(p292)") . 

Thanks to the faithfulness of contexts in forest algebra, it suffices to show that both sides 
induce the same transformations on forests, i.e. 

mmr{p2q2rt) = p{ipiqirpMP2q2rt) 
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holds for every forest t. 

Consider first the case when both p2, q2 have the hole in the root, and therefore so do 
Pi,q2- In this case the equality above becomes: 

P{UJ{SI + h) + UJ{S2 + t2) +t)= (3{UJ{SI + ti) +Si+t2+ UJ{S2 + t2) + t) . 

The above equality follows by commutativity of the horizontal monoid F, and aperiodicity 
of H, i.e. ujh + h = ujh. The latter is a consequence of aperiodicity of V , itself a consequence 
of (|6.2p . by iterating 

uh + h = {h + uyh = {h + UY{h + uyh = ujh + h + h . 

We can therefore now assume that in the context P2<l2^ at least one inner node is an ancestor 
of the hole. Thanks to the assumption on leaf saturation, in the contexts pi,qi,P2, Q2 every 
subtree that does not contain the hole can be squashed to a single node, without affecting 
the image under /3. We therefore assume that in the contexts above, all nodes outside the 
main path are leaves. As remarked above, a consequence of equation (j6.2p is that V is 
aperiodic, i.e. v'^ = v^v holds for every context type v. Therefore, it is sufficient to show 

Pi{PiqiriP2q2riP2q2)t) = /3((pigi)"pig2(p292)"(p2(72)i) . (8.3) 

The only part where a and /? disagree are twig nodes. Thanks to our assumption on the 
form oi pi,p2, qi, q2, the only place where the forests in ()8.3p contain twig nodes is P2q2t- 
Therefore, we have 

Pi{piqinP2q2nP2q2)t) = a((pigi)"(p2g2)")/3((p2'72)t) ■ 

In the same way we can decompose the right side of (18. Sp . Applying the assumption that 
the image of a satisfies (16. Sp . we get the desired result. □ 

Proof of Lemma \8. 8[ By Lemma \8.W\ a tree has type g under a if and only if a) its type 
under a belongs to G; and b) it has type g under /?. Condition a) can be tested by thanks 
to Lemma 18.71 Since F is a proper subset of H, we can use the induction assumption to 
test condition b). □ 



8.3. The induction base. In this section, we assume that the techniques from the previous 
two sections cannot be applied. That is: 

• All forest types from H \ X are in a single forest component. 

• For all inner node labels b B, v is reachable from va{b). 

Note that the second assumption does not necessarily mean that any context type reachable 
from V is in the same context component. Indeed, it is possible that for some forest type g, 
the context type v is no longer reachable from f (□ + g). 
We will show 

vf = vg foi all f,geH\X . (8.4) 
Before we do this, we show how Proposition 18.21 follows. For every every forest type h G H, 
we need to show that the forest language 

L = {t: v{a{t)) = h} 

is forest definable modulo X. By assumption ()8.4|) . there is some forest type ho G H such 
that vf = ho holds for all f e H\X. 
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• If an X-trimmed forest t contains an inner node label — which can easily be tested by the 
logic — then a{t) must be in the single forest component H\X. In particular, va{t) = Hq. 
So in this case, (p is either "true" or "false" depending on whether = h oi not. 

• Otherwise, the forest t is the concatenation of some leaves oi + • ■ ■ + a„. In this case, the 
type of va{t) can be calculated based on the set of leaf labels in t. 

The rest of this section is devoted to showing (j8.4p . The following lemma is the key step 
in our proof (18. 4p . It says that not only any two forest types h,g (z H\X can be reached from 
each other — which is the assumption on there being one forest component — but they can 
also be reached from each other by only using contexts without any branching. Furthermore, 
the context type that goes from g to h can be chosen independently of g. However, all these 
statements are relative to context types from the context component of v. 

Lemma 8.12. Let h G H\X. There are inner node labels bi,...,bn € B such that 
wh = wa{bi ■ ■ ■ bn)g holds for each forest type g G H\X and context type w in the context 
component of v. 

Proof. Let /i be a forest type outside X. We first show that there is a context type Ug such 
that h = Ufif holds for every forest type f G H. By assumption on there being only one 
forest component outside X, the forest type h can be reached from every forest type. In 
particular, there is some context type u such that h = u{hi + • • • + where /ii, . . . , /i„ 
are all the forest types in H. Let 

Uh = u{hi H h /in + □) • 

Thanks to idempotency and commutativity of H, i.e. identity (16. ip . 

/ll H \-hn = hi^ Vhn + f 

holds for any forest type /, and therefore also h = u^f ■ 
We can decompose the context Uh as 

Uh = ifi + a{bi)) •••(/„ + a{bn)) 

for some n and fi, ■ ■ ■ , fn G H and 6i, . . . , 6„ G B. (In general, some of the fi may be 
empty; but the proof follows the same lines.) Let us denote a{bi) by Vi. We will show that 

wh = wvi • ■ ■ Vng 

holds for any forest type g and any context type w in the context component of v, thus 
proving the lemma. 

Let then g, w be as above. As for h, we can define a context type Ug such that g = Ugf 
holds for any forest type /. This context can also be decomposed as 

Ug = {fn+l + a(6n+l)) • • • (/m + a{bm)) 

for some m > n + 1 and fn+i, ■ ■ ■ , fm ^ H and bn+i, . . . ,bm G B. As previously, we denote 
a{bi) by Vi. By definition, we have 

Vi---Vn-\Uh Vn+1 ■ ■ ■ Vm ^ Ug (8.5) 

Let now w (z V he in the same context component as v. By assumption on w and 
Lemma 18.41 also the context type wvi ■ ■ -Vm is in the same context component as v. In 
particular, there is some w (zV such that 

WVi ■ ■ ■ VmW = w . 
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By iterating the above ui times, and appending /i, we get 

wh = w{yi ■ ■ ■ VmwYh . 
Since Uhf = h holds for all forest types /, the above can be rewritten as 

W{V1 ■ ■ ■ VmWY{UhUgWYh . 

Using the property from identity (16.30 . we get 

w{vi ■ ■ ■ VmWY {UfiUgWY h = w{vi ■ ■ ■ V^wY Vi ■ ■ ■ VnUgW{UhUgVY h 

= w{vi ■ ■ ■ VmWYvi ■■■Vn9 = WVl ■ ■ ■ VnQ , 

which concludes the proof of the lemma. □ 

We now use the above Lemma to conclude the proof of (j8.4p . Indeed, let /, g be forest 
types outside X. By the above lemma, there are inner node labels bi, . . . ,bm B such that 

/ = wa{bi ■ ■ ■ bn)h g = wa{bn+i ■ ■ ■ bm)h 

holds for all w in the context component of v and all forest types h outside X. Let Vi = a{bi). 
By assumption on the equivalence class of v and by Lemma [531 there must be some v €V 
such that 

VVl ■ ■ ■ VjnV = V . 

But then we have 

Vf = v{vi ■ ■ ■ VmVYf = v{vi ■ ■ ■ VrnVYVn+l ■ ' ' VmV{vi ■ ■ ■ VmVY f = v{vi ■ ■ ■ VmVT 9 = Vg . 

The second equality follows from ()6.2p . 

9. Empty forests 

The forest algebra setting used in this paper does not allow empty forests. There is also 
a two-sorted alphabet {A^B)^ where letters from A are only allowed in leaves, and letters 
from B are only allowed in inner nodes. A different, and arguably more elegant, setting is 
considered in [6j, where empty forests are allowed, and only one alphabet is used. 

Why do we not use the forest algebra with empty forests here? The reason is that the 
completeness proof in Proposition 18.21 uses an induction on the size of the leaf alphabet, so it 
helps that the leaf alphabet is part of the definition of the forest algebra. The assumption on 
nonempty forests follows, since if we want a separate alphabet for leaves, there are algebraic 
reasons to consider forest algebras without the empty forest. A natural question emerges: 
does our characterization also work for forest algebra with empty forests? In this section, 
we give an informal argument that the answer to this question is yes. 

We will not give a detailed discussion of forest algebra with empty forests here. We 
define only define the syntactic object. The interested reader is referred to [6]. Let A be 
an alphabet. We define A^ (respectively. Ay) to be the set of (possibly) empty forests 
(respectively, contexts) labeled by A, without any restriction on labels in leaves or inner 
nodes. We write A^ for the pair {A^,Ay). The only difference between A^ and {A^A)^ 
is that the second does not allow the empty forest on its first coordinate. It is not hard to 
see that A^ is a forest algebra, as defined in Section HI Given a set L of forests, possibly 
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including the empty forest, the syntactic forest algebra with empty forests of L is defined to 
be the quotient of ^4'^ under the two-sorted equivalence relation defined below. 

t ~ t' iff yp £ pt e L ^ pt' £ L 

p^p' iff Vg G Ay^s £ Ah qps e L <^ qp's G L 

This equivalence relation is a refinement of the Myhill-Nerode equivalence introduced in 
Section m (for the case when A = B). It may possibly distinguish more contexts because 
the variable s can also quantify over the empty forest. 

Theorem 9.1. Let L be a forest language. Let (H, V) be its syntactic forest algebra, and let 
{H',V') be its syntactic forest algebra with empty forests. If {H,V) satisfies the identities 
from Theorem \6.S\ then so does {H',V'), and vice versa. 

Proof. We begin with the right to left implication. Since {A,A)^ is a subalgebra of A^, 
and since the equivalence relation defining {H' , V') is a refinement of the equivalence rela- 
tion defining {H,V), it follows that {H,V) is a subalgebra of {H',V'). In particular, any 
identities that hold in the latter must also hold in the former. 

For the left to right implication, assume that (H, V) satisfies the identities from The- 
orem [621 By the theorem, the recognized language L is forest-definable in EF -|- F~^. To 
conclude, we will show that if a language L is forest-definable in EF-|-F~^, then its syntactic 
forest algebra with empty forests {H', V) satisfies the identities from Theorem 16.21 This 
follows by the correctness argument presented in Section [71 The reason why we can use 
that argument is that it relied on Proposition 17.31 to transfer Duplicator strategies, and this 
proposition also works for the more general one-sorted morphisms that are appropriate for 
forest algebras with empty forests. □ 



10. One quantifier alternation 

In |13] . it was shown that over words, the temporal logic F-|-F^^ has the same expressive 
power as II2 n 112, where 

• E2 are word properties definable by a first-order formula with quantifier prefix 3*V*; the 
signature contains label tests and the left-to-right order on word positions. 

• 112 are complements of S2. 

For instance, consider the word language b*aA* over the alphabet A = {a,b,c}. This 
language can be defined in F -|- F~^ by the formula 

F(a A^F-^^6) . 

This language can also be defined both in S2 and 112, as witnessed by the formulas: 

3x\/y a{x) A {y < x =^ Kv)) ^ ^2 

VxBy c(x) {y < X A a{y)) £ II2 . 

Both classes S2 and 112 can be extended to trees using the descendant order on tree nodes. 
We show here that the result from [13] fails for trees: 

Proposition 10.1. Over trees, the classes EF -|- F"-*^ and S2 H 112 have incomparable ex- 
pressive power. Likewise for forests. 
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A mentioned in the introduction, the class $]2nn2 was given an effective characterization 
in [3]. We prove the above proposition for forests, the case for trees is done the same way. 
The inequahty 

EF + D S2 nn2 

is witnessed by the language "three nodes with label a", which cannot be defined in EF + F~^ 
by virtue of (16. ip . To show the remaining inequality 

we will demonstrate in the following lemma that the forest property "no root node is a leaf" 
cannot be defined in S2, although it is forest-definable in EF + F~^. 

Lemma 10.2. Let a be a leaf label, b an inner node label, and ip be a formula of the form 

3x1 .. . Xiiyi . . . yjip{xi ...Xi,yi... yj) G S2 , 
with ip quantifier-free. Let n> i + j. If n(ba) satisfies ip, then so does n{ba) + a. 

Proof. Assume then that n{ba) satisfies We need to show that n{ba) + a does too. For 
xi, . . . ,Xi, we pick the same nodes in n{ba) + a as the nodes in n{ba) that witnessed ip. We 
need to show that for any assignment of the nodes yi,. . . ,yj in n{ba) + a that makes ip 
false, we also can find an assignment in n{ba) that makes ip false. The key point is that any 
assignment of xi, . . . ,Xi,yi, . . . ,yj in n{ba) + a must leave at least one copy of ba without 
any variables; this copy can be used in n{ba) to simulate a. □ 

11. Closing remarks 

The contribution of this paper is a characterization of languages definable in EF + F~^. 
This characterization is expressed in terms of identities that must be satisfied in the syntactic 
algebra. A corollary of this characterization is an algorithm for deciding if a given regular 
language can be expressed in EF + F~^. The algorithm runs in polynomial time if the input 
is given as a forest algebra. 

As mentioned in the introduction, there are many open problems waiting to be solved 
in this field. Of those closely related to EF + F~^, the following look interesting: 

• What are the identities for two-variable first-order logic with the descendant relation? 
The question boils down to: what identity should replace idempotency h + h = hi Here 
is one candidate: v{h + h) + vh = vh -\- vh. 

• What are the identities for an extension of EF + F~^, where we allow operators of the 
form EF^ip, with the meaning: "the current node has k incomparable descendants where 
ip holds" . This seems to be a reasonable extension of EF + F~^ that is capable of counting 
in a proper way (recall that two-variable logic could express the property "there are two 
a's", but not the property "there are three a's"). 

It is conceivable that a modification of the techniques developed in this paper can be 
sufficient to solve the above two logics. For other logics mentioned in this paper, such as 
full first-order logic, or even variants of EF-|-F~^ with horizontal order, new techniques need 
to be developed. 
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